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Abstract 

. The large deviations principle for the empirical measure for both continuous 

p I ' and discrete time Markov processes is well known. Various expressions are avail- 

able for the rate function, but these expressions are usually as the solution to a 
variational problem, and in this sense not explicit. An interesting class of con- 
tinuous time, reversible processes was identified in the original work of Donsker 
and Varadhan for which an explicit expression is possible. While this class in- 
cludes many (reversible) processes of interest, it excludes the case of continuous 
time pure jump processes, such as a reversible finite state Markov chain. In 
^ . this paper we study the large deviations principle for the empirical measure of 

pure jump Markov processes and provide an explicit formula of the rate function 
. under reversibility. 

■ 1 Introduction 

O 

Let X (t) be a time homogeneous Markov process with Pohsh state space S, and let 
P {t, X, dy) be the transition function of X {t). For t G [0, oo), define Tt by 

><■ Ttf{x) = I f{y)P{t,x,dy). 



s 



Then Tt is a contraction semigroup on the Banach space of bounded, Borel measurable 
functions on 5 [6l Chapter 4.1]. We use £ to denote the infinitesimal generator of 
Tt and T> the domain of C (see [U Chapter 1]). Hence for each bounded measurable 
function f €V, 



£ f (x) = lim — 

^ ^ m t 



s 



f{y)P{t,x,dy)-f{x) 
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The empirical measure (or normalized occupation measure) up to time T of the 
Markov process X (t) is defined as 



1 



6xit){-)dt. (1.1) 

Let V (S) be the metric space of probability measures on S equipped with the Levy- 
Prohorov metric, which is compatible with the topology of weak convergence. For 
7] gV (5), define 

1(77) = - inf / —d7]. (1.2) 

uevJs u 

u>0 

It is easy to check that / thus defined is lower semicontinuous under the topology of 
weak convergence. Consider the following regularity assumption. 

Condition 1.1 There exists a probability measure X on S such that for t > the 
transition functions P {t,x,dy) have densities with respect to X, i.e., 

P{t,x,dy)=p{t,x,y)X{dy). (1.3) 

Under additional recurrence and transitivity conditions, Donsker and Varadhan 
[21 [3] prove the following. For any open set O C P (S) 

liminf-logP(?7T(-) € O) > - inf / (??) , (1.4) 

T->oo T rjeO 

and for any closed set C C V (S) 

limsup;^logP(7?T(-) G C) < - inf 1(7?). (1.5) 



We refer to (|1.4p as the large deviation lower bound and (jl.Sp as the large de- 
viation upper bound. Under ergodicity, the empirical measure rjT converges to the 
invariant distribution of the Markov process X (t) . The large deviation principle 
characterizes this convergence through the associated rate function. While there are 
many situations where an explicit formula for (|1.2p would be useful, it is in general 
difficult to solve the variational problem. The main existing results on this issue are 
for the self-adjoint case in the continuous time setting, see [21 [9l [11]. Specifically, 
suppose there is a fi-finite measure f on 5, and that the densities in (11. Sp satisfy the 
following reversibility condition: 

p{t,x,y) = p{t,y,x) almost everywhere {if x tp) . (1-6) 

Then Tf is self-adjoint. If we denote the closure of £ by £ (see, e.g., [6l ppl6]) and 

the domain of £ by D (£), then £ is self-adjoint and negative semidefinite (since Tj 

~ 1/2 

is a contraction). We denote by (— £) the canonical positive semidefinite square 
root of —C \10\ Chapter 12]. Let be the domain of {—C) . Donsker and 

Varadhan [21 Theorem 5] show under certain conditions that I defined by ()1.2p has 
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1 /2 ~ 

the following properties: / (fi) < oo if and only if <C 93 and {dii/dip) ' € 1*1/25 aiid 
with / = dfi/dif and g = /^^^, 

I{f,) = \\{-Cf'g\\\ (1.7) 

where ||-|| denotes the norm with respect to (p. Typically, if is taken to be the 
invariant distribution of the process. 

It should be noted that this explicit formula does not apply to one of the simplest 
Markov processes, namely, continuous time Markov jump processes with bounded 
infinitesimal generators. Let B {S) be the Borel u-algebra on S and let a {x, T) be 
a transition kernel on S x B{S). Let B (S) denote the space of bounded Borel 
measurable functions on S and let g € (S) be nonnegative. Then 

£f (x) =q{x) [ if (y) - / (x)) a [x, dy) (1.8) 

defines a bounded linear operator on B {S) and C is the generator of a Markov process 
that can be constructed as follows. Let {X„,n S N} be a Markov chain in S with 
transition probability a(x,r), i.e. 

Pr(X„+i Gr|Xo,Xi,...,X„) =a(X„,r) (1.9) 

for all r S B{S) and n € N. Let ri,T2,... be independent and exponentially dis- 
tributed with mean 1, and independent of {X„,n G N}. Define a sojourn time Si for 
each i = 1, 2, ... by 

g(X,_i)si = r,. (1.10) 

Then 

n n+1 



X (t) = Xn for <t <^Si 



i=l i=l 

(with the convention Yl^=i^i ~ 0) defines a Markov process {X (t) ,t G [0, 00)} with 
infinitesimal generator C, and we call this process a Markov jump process. 

A very simple special case is as follows. Using the notation above, assume S = 
[0,1], q = 1 and for each x € [0, 1], a (x, •) is the uniform distribution on [0, 1]. The 
infinitesimal generator C defined in (jl.Sp reduces to 



Cf{x) = C f{y)dy-f{x), 
Jo 

which is clearly self-adjoint with respect to Lebesgue measure. Now let C be the 
collection of all Dirac measures on S, then C is closed under the topology of weak 
convergence on V (S). Hence a large deviation upper bound would imply 

limsup;^logPr(r/r G C) < - inf I (/x) . (1.11) 
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However, the probability that the very first exponential holding time is bigger than 
T is exactly exp{— T}, and when this happens, the empirical measure is a Dirac 
measure located at some point that is uniformly distributed on [0, 1] . Hence 

liminf ;^logP(r/T(-) S C) > lim inf log P (n > T) = -1. 

In fact, we will prove later that the rate function for the empirical measure of this 
Markov jump process never exceeds 1. However, if the upper bound held with the 
function defined in (jl.7p . one would have I {5a) = oo for a € [0, 1], and by (jl.lip 

limsup — log P {r]T (•) G D) = — oo, 

which is impossible. 

This example implies that this type of Markov jump processes are not covered by 
[21 [3]. In fact, for any t > by considering the case where the exponential holding 
time is greater than t and the case where it is smaller than t, the transition function 
P {t,x, dy) takes the following form 

P (t, X, dy) = e~'6.^ (dy) + (l - e"*) l[o,i] (y) dy, 

which means that we cannot find a reference probability measure A on 5 such that 
P {t, X, •) has a density with respect to A (•) for almost all x G 5 and t > 0, which is 
a violation to Condition 11.11 used in [SJ [3] , and also violates the form of reversibility 
needed for (jl.7p . 

A condition such as Condition 11.11 holds naturally for Markov processes that 
possess a "diffusive" term in the dynamics, which is not the case for Markov jump 
processes, and the form of the rate function given in (jl.7p will not be valid for this 
type of processes either. The purpose of the current paper is to establish a large 
deviation principle for the empirical measures of reversible Markov jump processes, 
and to provide an explicit formula for the rate function like the one given in (II. 7p . We 
also show why the boundedness of the rate function results from the fact that tilting 
of the exponential holding times with bounded relative entropy cost can be used 
for target measures that are not absolutely continuous with respect to the invariant 
distribution. 

Finally we should mention that [1] evaluates (|1.2p for certain classes of measures 
when C is the generator of a jump Markov process satisfying various conditions. 
However, it does not present an expression for an arbitrary measure, and indeed in 
appears that the authors are unaware that ()1.7p is not the correct rate function for 
such processes, or that the large deviation principle had not been established. 

The paper is organized as follows. In Section [2] we identify the assumptions on the 
process. In Section [3] we state the main result. Theorem 13.11 The proof of Theorem 
13.11 is divided into two sections. Section U] for the upper bound and Section [5] for 
the lower bound. In the last section, we discuss the special feature of Markov jump 
processes that leads to the boundedness of the rate function. 
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2 Assumptions 



Our first assumption is that the Polish state space S is compact. While compactness 
is not needed, it allows us to focus on the novel features of the problem. For standard 
techniques to deal with the non-compact case see, e.g., [3]. 

A construction of Markov jump processes was given in the Introduction, and 
we continue to use the notation introduced there. The jump intensity q in (jl.8|) is 
assumed to be continuous on S, and there exist < Ki < K2 < 00 such that 

Ki<q{x)<K2. (2.1) 

To ensure ergodicity of X (t) , we need several conditions on the transition func- 
tion a in ()1.9p . Recall that V {S) is the metric space of probability measures on S 
equipped with Levy-Prohorov metric, which is compatible with the topology of weak 
convergence. 

Condition 2.1 a satisfies the Feller property. That is, a (x, •) : S 1 — > V (S) is 

continuous in x. 



Remark 2.2 The Feller property and the compactness of S guarantee a has an in- 
variant distribution Q Proposition 8.3.4], which we denote by vr. The houndedness 
of q enables us to define a probability measure vr according to 

[a -ttTT (dx) 

vr(A) = yt^^ . (2.2) 
Since n is invariant under a, i.e., vr (•) = f^a {x, •) tt {dx), we have 



Is 



{Cf (x)) vr {dx) = , , I I [f{y)-f {x)] a {x, dy) n {dx) = 0. 



IS Is {dx) Js Js 

By Echeverria's Theorem Theorem 4-9.17], vr is an invariant distribution of X {t). 

Condition 2.3 a satisfies the following transitivity condition. There exist positive 
integers Iq and uq such that for all x and C in S 

00 ^ 00 

{x,dy)«J2^c.^'HCdy), 

i=lo i=no 

where a^^'^ denotes the k-step transition probability. 

Remark 2.4 Under this condition, tt is the unique invariant distribution of a ^ 
Lemma 8.6.2]. Thus vr defined by Ii2.<i\) is the unique probability distribution that sat- 
isfies fg {Cf {x)) TT {dx) = 0, and hence by Theorem 4-9.17] is the unique invariant 
distribution of X{t). 
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Condition 2.5 There exists an integer N and a positive real number c such that 

for all X €z S. As before, we use a^^^ to denote the N-step transition probability. 

Remark 2.6 This type of assumption is common in the large deviation analysis of 
empirical measures. See e.g., |3 Hypothesis 1.1]. 

Condition 2.7 The support of tt is S. 

Remark 2.8 This condition guarantees that any probability measure G "P {S) can 
be approximated by measures that are absolutely continuous with respect to vr. Indeed, 
by applying the Lebesgue Decomposition Theorem ^ Theorem 3.8], one obtains a 
Borel measurable function 9 & (tt) and a subprobability measure rj^ on S such that 
6 >Q, rj^ ^-^\' and for any A (Z S, 

r]{A)= [ 9 (x) IT (dx) + r]^ (A) . 
J A 

If T]^ (S) > 0, then one can find a subset Si C S such that tt (Si) = 0, rj^ (Si) > 
and rj^ {S\Si) = 0. For any x £ Si and any open neighborhood of x, since the 
support of TT is S we have tt (N^) > 0, which implies that D Si = {x}. Hence Si 
only contains isolated points, i.e., r]^ is a discrete measure. A discrete measure can 
be approximated in the weak topology by measures that are absolutely continuous with 
respect to tt since the support of tt is S. 

Remark 2.9 Condition \2. 7\ excludes the existence of transient states. Although one 
can obtain an LDP for X (t) that has transient states, one would end up with a rate 
function that depends on the initial state. 

Reversibility is always required in order to obtain an explicit formula for the rate 
function. Recall that T> is the domain of C. In this paper, we assume C is self-adjoint 
(or reversible) under tt in the following sense: for any f,g£T) 

[ {Cf {x))g (x) TT {dx) = [ {Cg (x)) / (x) vr (dx) . (2.3) 

An equivalent condition for (12. 3p to hold is the "detailed balance" condition, i.e., for 
TT-a.e. x,y £ S 

q (x) Q (x, dy) tt (dx) =q{y)a (y, dx) tt (dy) . (2.4) 
Note that (H^D directly imphes (£/ (x)) tt (dx) = for all f eV. 
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3 A large deviation principle 



3.1 Definition of rate function 

In this subsection, we give the definition of the rate function /. In later sections we 
prove that / thus defined is the correct form of the large deviation rate function for the 
empirical measures of the Markov jump processes. All conditions stated in Section 
[2] will be assumed throughout the rest of the paper. We wish to study the large 
deviation principle for the empirical measures rjx € (S) defined by Under 
compactness of S and Condition 12. H 1]^ converges in distribution to an invariant 
distribution of C. As pointed out in Remark 12. 4|, vr is the unique invariant distribution 
of C, and thus r]T converges in distribution to vr. Let H be the collection of all 
distributions that are absolutely continuous with respect to vr, i.e. 

H={r]er{S):ri^TT}. (3.1) 

For T] H, and assuming that the integral is well defined, consider 




where 9 = drj/dTr. This is a rewriting of ||(— >C) g\\'^ in (11. 7p . By inserting the form 
of C in (jl.Sp . we obtain the candidate rate function 

/ = [ q{x)7] (dx) - [ 0^/2 (x) 0^/2 (y) q (2;) a {x, dy) it {dx) . (3.2) 
Js JsxS 

Note that by applying (j2.4p and using the Cauchy-Schwartz inequality, one can 
prove that / defined by (j3.2p is nonnegative. Recall that K2 is the upper bound of q 
as in (j2.ip . and thus / is bounded above by K2- In addition, it is straightforward to 
show that / is convex on H. 

We want to extend the definition of / to all measures in V{S). As pointed 
out in Remark 12.81 H is dense in V (S) under the topology of weak convergence. 
Hence we can extend the definition of / to all of V (S) via lower semicontinuous 
regularization with respect to the topology of weak convergence. Thus if rjn 1] 
weakly and {r]n} G H, liminf.„^oo (?/n) > -^(^)) and equality holds for at least 
one such sequence. This extension guarantees that the extended / is convex, lower 
semicontinuous and bounded above by K2 on all of V{S). The compactness of S 
and the lower semicontinuity of / ensure that / has compact level sets. Being a 
nonnegative, lower semicontinuous function with compact level sets, / indeed is a 
valid large deviation rate function. 

We have finished the definition of the rate function I, and are now ready to state 
the large deviation principle. 

3.2 A large deviation principle 

Our main result is the following: 
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Theorem 3.1 Let X (t) be a Markov jump process satisfying all the assumptions in 
Section Let I be defined as in Section \3.1[ Then the large deviation bounds 1^1. 4\ ) 
and iL5]) hold. 

To prove Theorem 13.11 it suffices to sliow tlie equivalent Laplace principle [H 
Tlieorem 1.2.3]. Specifically, we establish that for any bounded continuous function 
F ■.r{S)^R 

hm logE[exp {-TF (t^t)}] = inf [F (rj) + I (r?)] . (3.3) 
T->oo I rieV{S) 



By adding a constant to both sides of (j3.3p we can assume F > and do that for the 
rest of the paper. The proof is based on a weak convergence approach and is split 
into two parts: a Laplace upper bound and a Laplace lower bound. 

Relative entropy plays a key rule in the proof, we hence state the definition and 
a few important properties. Details can be found in |3]. 

Definition 1 Let (V, A) be a measurable space. For 9 ^ V (V) , the relative entropy 

R{- \\9) is a mapping from V (V) into the extended real numbers. It is defined by 



when 7 G 7^ (V) is absolutely continuous with respect to 6 and \og d^y/dO is integrable 
with respect to 7. Otherwise we set R (7 \\9) = 00. 

If V is a Polish space and A the associated cj-algebra, then i? (• || • ) is nonnegative, 
convex and lower semicontinuous in both variables (with respect to the weak topology 
on V (V)). We state the following two properties of relative entropy. 

Lemma 3.2 (Variational formula) Let {V,A) be a measurable space, k a bounded 
measurable function mapping V into M, and 9 a probability measure on V. The 
following conclusions hold. 

(a) We have the variational formula 

-log [ e-^d9= inf \r{^\\9) + [ kd^r} . (3.4) 



(b) The infimum in 1^3. 4\) is attained uniquely at 70 defined by 

^(x) = e-'=W/ f e-^d9. 
d9 Jv 

Theorem 3.3 (Chain rule) Let X and y be Polish spaces and (3 and 7 probability 
measures on X xy. We denote by and [7]! the first marginals of (3 and 7 and 
by (3 {dy\x) and 7 {dy\x) the stochastic kernels on y given X for which we have the 
decompositions 

j3 {dx X dy) = [/3]i {dx) ® /3 {dy\dx) and 7 {dx x dy) = (dx) (g) 7 {dy\dx) . 
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Then the function mapping x G X R {/S {-Ix) ||7 (-Ix) ) is measurable and 
R{^\\j) = R\ 



i||[7]i)+ / R{P{-\x)\\j{-\x))m,{dx). 
Jx 

We devote the next two sections into proving the Laplace upper bound and the 
Laplace lower bound, respectively. 

4 Proof of the Laplace upper bound 



In this section, we prove the Laplace upper bound part of (I3.3p . i.e. 

lim inf log ii; [exp {-TF (7?t)}] > inf [F (r?) + / (??)] . 



(4.1) 



Recalling the construction of X[t) in the Litroduction, we define a random integer 
Rt as the index when the total "waiting time" first exceeds T, i.e. 



Rt-I 



Rrp 



(4.2) 



i=l i=l 

Then the empirical measure rjT can be written as 



1 



VT (•) = ^ / ^ 



T 



1 

T 

1 
T 



x(t) 



Rt-1 



dt 



Rt-1 



. i=l 
Rt-1 



i=l 



Rt-1 



i=l 



(4.3) 



The proof of (j4.ip will be partitioned into two cases: Rt/T > C and < Rt/T < C, 
where C will be sent to oo after sending T — ?> oo. 



4.1 The case Rt/T > C 

Let F : V{S) — ?• M be nonnegative and continuous. Then since -F > 0, 



l{(C,oo)}(i?r/T)e-^^(^'-) 



> 



1 



LTCJ+l 



--logP^ '^<T 

i=l 
LTCJ+l 



T 
1 



logP^ E 



< T 



i-lj 



>--logPj J2 ^«<^2Ti. 
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Using Chebyshev's inequality, for any a G (0, oo) 




LTCJ+l 



n < K2T 




'1=1 



[TCi+l 



} 



ITCJ+l 1 

'i = l '» 



gaK2Tg( LTCJ+l) log 



For the last equality we have used the fact that if r is exponentially distributed 
with mean 1 then Ee"''^ = 1/ (1 — a) for any a G (—00, 1). Combining the last two 
inequalities, we have 



Note that C + C log C - - C log ^ 00 as C ^ 00. 

4.2 The case < Rt/T < C 

4.2.1 A stochastic control representation 

In this case we adapt a standard weak convergence argument, see [3] for details. 
Specifically, we first establish a stochastic control representation for the left hand 
side of (j3.3p and then obtain a lower bound for the limit as T ^ 00. In the rep- 
resentation, all distributions can be perturbed from their original form, but such 
a perturbation pays a relative entropy cost. We distinguish the new distributions 
and random variables by an overbar. In the following, the barred quantities are 
constructed analogously to their unbarred counterparts. Hence Tj and Xi are chosen 
recursively according to stochastic kernels ai (•) and on (•), i.e., ai (•) and ai (•) are con- 
ditional distributions that can depend on the whole past. Specifically, ai (•) depends 
on {Xo,ri,Xi,f2, . . . and 04 (•) depends on {Xq, n, Xi, f2 . . . ,Xi_i,fj}; Si is 

defined by (jl.lOp using Xi and fj; Rt is defined by ()4.2p using s,; and Tit is defined by 
(j4.3p using Xi, fi and Rt- It will be sufficient to consider any deterministic sequence 
{tt} such that < ry/T < C, and rx/T ^ A for some A G [0, C] as T -> 00. We 
restrict consideration to controlled processes such that Rt = rT by placing an infinite 
cost penalty on controls which lead to any other outcome with positive probability. 
Let 1 (A) denote the indicator function of a set A. By applying ^ Proposition 4.5.1] 



Ihninf - - logii; [1|(c,oo)}(^t/T) • exp{-rF(r?r)}] 



> sup [-K2a + C\og{l + a)] 



aG(0,oo) 

= C + C\ogC - K2-C\ogK2- 
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and Theorem 13.31 the following is vahd: 



- log E [exp { -TF{r]T) - T • w (l 



{tt/TY 



{Rt/T))]] 



(4.4) 



-^logi? 



exp 



inf ^ 



TF{r]T) - r- oo-l^Si<r<J^s, 

V i=l i=l / . 

F(7?r) + oo-l Yl Si<T<YsA +-^[i2(a,_i||a) + i2(a, 



j=l 



i=l 



(4.5) 



where the infimum is taken over all control measures {a^jfij}. Since in Section [5] we 
will prove a similar but more involved representation formula, Lemma l5.H we omit 
the proof of this representation. Due to the restriction Rt = tt, one can write f/j^ as 



VT (•) = ^ 



V'T — 1 



Ti 



rT — 1 



+ 6 



i-l> 



(4.6) 



In the following proof, we repeatedly extract further subsequences of T. To keep 
the notation concise, we abuse notation and use T to denote all subsequences. Note 
also that in proving a lower bound for (14. 4p it suffices to consider a subsequence of 
T such that 

sup-;^log^[exp{-rF(r?T)-r-«)- (l|,^/r}=(i2T/r))}] <«). (4.7) 

T -t 

We assume this condition for the rest of this subsection. 

The relative entropy cost in (|4.5p includes two parts, RE}p = ^ Y^^i=i R ("^i-i ||a) 
and REj^ = ^ Yl\=i R{^i W^)- We will prove that for any sequence of controls {ai,ai} 
in (USD 

hm inf E [F {7)t) + RE\ + RE^] > inf [F (??) + / (??)] . (4.8) 

Toward this end, it is enough to show that along any subsequence of T such that 
r^jT A, we can extract a further subsequence along which (j4.8p holds. In addition, 
it suffices to consider only functions F that besides being nonnegative, are also lower 
semicontinuous and convex. This restriction is valid since / is convex and lower 
semicontinuous, and follows a standard argument in the large deviation literature. 
The interested reader can find the details in [8]. 

In light of ()4.5p and (j4.7p we assume without loss of generality 

(4.9) 

Since the proof of (j4.8p is lengthy, we analyze each term on the left hand side of 
8]) in the following subsections. 



sup E [F {fjT) + RE^ + RE^] < oo. 

T 
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4.2.2 The term RE^ 

The cost RE^ comes from distorting the dynamics of the embedded Markov chain, 
and indeed the analysis gives a very similar conclusion to that of an ordinary Markov 
chain ([U Chapter 8]). For any probability measure v on S x S we will use notations 
z^]^ and [z/]2 to denote the first and second marginals of v. We have the following 



result for RE}p. 



Lemma 4.1 Consider any sequence of controls {aj, cjj} in ^.5^ such that (^TPjj holds. 
Along any subsequence of T satisfying r^/T — >■ A, define a sequence of probability 
measures on S x S via 



1 '"^ 

fix {dx, dy) = — 'Y^ 5x^_^ (dx) ai-i (dy) 



i=l 

Then one can extract a further subsequence such that Efij- converges in distribution 
to a probability measure fi on S x S, and 

lira inf E [RE]^^ > AR{fi\\[fL]^® a) . 

Furthermore, if A > then fi satisfies 

[A]i = M2- (4.10) 
Proof. By the chain rule (Theorem I3.3p and the convexity of relative entropy 

"l """^ 
E -^Rioi-iWa) 



E [RE^' 



E 



> E 



> 



i=l 
i=l 



Oii-l 



a 



^-R{E^iT\\[E^iT]i®a). 



Since S* x S* is compact, for any subsequence of T there exists a further subsequence 
along which E^t converges weakly to a probability measure fi. Under the Feller 
property of a (Condition 12. ip . [E^t\i ® ot converges weakly to [fi]^ a. The lower 
semicontinuity of relative entropy then implies 

liminf^ [RE^] > liminf ^R{EfiT fS) a) > AR {fi ® a) . 

This finishes the first part of Lemma 14.11 For the second part, we employ a 
standard martingale argument. Let J-i be the a- algebra generated by the random 
variables { [Xq, . . . , Xj) , (fi, . . . , fj)}. Thus Ti is a sequence of increasing a- algebra's 
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and, since Ui selects the conditional distribution of Xi, for any bounded continuous 
function f on S 



E 



fix.. 



) - f (y) (dy) 

Js 



0. 



Hence for integers 0<i</c<rr — 1 



E 



f{X,) - / fiy)a,{dy)) ( f {X,) 



f (y) Oik (dy) 



0, 



and thus for any bounded continuous function f on S 



E 



/ / (x) f^T {dx, dy) - f {y) fir {dx, dy) 
Jsxs Jsxs 



E 



SxS 



1 ^ ^ ^ r 



a-i-i {dy) 



1 ''^ 



< 



4 

-11/11 



f (X,_i) - f{y) ai-i {dy) 



Since < j4 = ImiT^oo I'T /T , we have r^/T > A/2 for all T large enough. Using 
Chebyshev's inequality and the last display we conclude that [^j,t\i — [^t]2 converges 
to in probability as T — )■ oo, and therefore [jl]-^ = [/i]2 with probability 1. This 
concludes the second part of Lemma 14.11 ■ 

4.2.3 The term RE"^ 

We now turn to the second cost RE^. This cost comes from distorting the exponential 
sojourn times. We introduce a function i which is closely related to the relative 
entropy of exponential distributions: i (x) = xlogx — x + 1 for any x >0. 

Lemma 4.2 Given any sequence of controls {ai, fij}, fix a subsequence of T for which 
the conclusions in Lemma \4. 1\ holds. Then we can extract a further subsequence along 
which 



liminf ^ \RE^] > 



£ {u) ^ {dx, du) . 



Sxl 



Here ^ is a finite measure on S x and is related to ft in Lemma by 

ul{dx,du) = A[fL\{dx) . (4.11) 

Before proving this lemma, we need to define g : IR+ — ^ M by 5 (5) = — log 6+5—1. 
The functions g and £ are related by 

g[x) = x(.{l/x) , 



and g has the following property. 
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Lemma 4.3 Let a be an exponential distribution with mean 1. Then 



inf |i?(7||f7) : y uj{du) = b'^=g{b). 



(4.12) 



Proof. Let at be the exponential distribution with mean 6, i.e., 

o"b (du) = -e^bdu. 
b 

Then {u) = lei^-iy for u > 0. Picking any 7 such that R (7 \\a) < 00, 

= / log f^), (*,)+/ logf^),(d«) 



= R{^\\ab)+ [ 

= i2(7|k;,)+5(fo) 
>9{b) 



log 6 + ( 1 - - I n 



7 (du) 



and the infimum in ()4.12p is achieved when R (7 ||o"fe) = 0, i.e., 7 = ai,. 
Proof of Lemma 14.21 Lemma 14.31 guarantees that 



(4.13) 



Recall the definition of J-i as the c-algebra generated by the controlled process up to 
time i. Since fij selects the conditional distribution of fj, 



E[Ti\Ti^i] = j uai {du) . 



Define fhi = f uai (du), for z = 1, . . . , — 1. The definition of fhrj, requires more 
work. Recalling the definition of Rt by the equation analogous to (|4.2|) and the 
restriction that Rt = rx, 



r-r — 1 



< 



1 g(^i-i) q{Xrj.-l) 



Multiplying both sides by q (Xrj,-i) and taking expectation conditioned on J^^t-Ij 



q (Xrj,-l) \ T - —r^ r- < E [Trj,\Trj,-l] = / uarj, (du) . 

\ ^9 [Xi-i) I J 
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Define 



and define m^j, by 



At = q {X, 



/ r^ — 1 



(4.14) 



rrir 



J uarrp (du) if At < / uarj, {du) < 1 
1 if At < 1 < / uarj, (du) 

At if 1 < At < / uarj, (du) 



(4.15) 



i.e., nirj, is the median of the triplet (At, / uarj, (du) , l). Since g is increasing on 
(1, oo), we have g (J uarj, (du)^ > g (rhrj,) in all three cases. Thus by (|4.13p . 

1=1 ^-^ ^ i=i 

Next consider the measure on 5 x defined by 

1 

Ct {dx, du) = -Y^ 5x^^^ [dx) Sf^^^yi [d- 

i=l 

The total mass of E'^t is 

1 



u) rrii. 



(4.17) 



i=l 



According to (|4.9p and the assumption that F > 0, we have 

sup E [RE'^] < oo. 



(4.18) 



By (|4.16p supj^-E [Y^l^ig (wi) /r] < oo. We also have by a straightforward calcula- 
tion that X < max {50, 10^ (x) /9}. Using this and the fact that rx/T < C we have 
suprp E [^I'l^^fhi/T] < oo, i.e., the total mass of E^t has a bound uniform in T. 
Thus when viewed as a sequence of measures on the compact space S x [0, oo] , E^t 
is tight due to the uniform boundedness of the total mass. We denote the weak limit 
by ^, which is a finite measure. Since the function i is nonnegative and continuous. 



limmf E\RE^] > liminf 



lim inf E 

r-s>oo 



1 ""^ 

-^g{fhi) 



i=l 



i (u) ^T (dx, du) 



5x1 



lim inf 

T-5>oo 



i (u) E^T (dx, du) 



(4.19) 



> 



5x1 



5xR+ 
i (n) (dx, du) 
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We next explore the relation between ^ and /i. In order to establish ()4.1ip . it 
suffices to show that for any bounded continuous function f on S 

I uf (x) e {dx, du) = A [ f (x) [p]^ {dx) . 
JSxM.+ Js 

By the definitions of S^t and fix 

[ uf (x) E^T {dx, du) = '^ [ f (x) [EfiT]i (dx) . (4.20) 
JsxM.+ ^ Js 

Then (j4.18|) and (j4.19|) imply there is a uniform upper bound on 

i{u) [ f{x)E^T{dx,du). 



S 



If we consider / (x) E^t {dx, du) as a sequence of measures on with bounded 
total mass, then / (x) E£^t {dx, du) converges weakly to / (x) ^ {dx, du). Since I 
is superlinear, [U Theorem A. 3. 19] implies that 



lim / uf {x) E^T {dx,du) = / uf {x) ^{dx,du) . 

Using 

hm f f{x) [E^iT]^ {dx) = A [ f{x) [All (dx) 
and (|T20]) we arrive at (|iTT]) . ■ 

4.2.4 The term E-qT 

Lemma 4.4 Given any sequence of controls {a.i,ai}, fix a subsequence ofT for which 
the conclusions in Lemma \4.S\ hold. Then we can extract a further subsequence along 
which 

\\muiiE[F {fir)] > F {ff) 

T— i-oo 

for some probability measure fj on S, which is related to ^ in Lemma \4-'A 

q{x)fi{dx) = [i]i{dx). (4.21) 

Proof. As a sequence of probability measures on the compact space S, we can always 
extract a subsequence of T such that Efix converges weakly to a probability measure 
on S which we denote by fj. The convexity and lower semicontinuity of F imply that 

hm inf S [F (t/t)] > hm inf F {Efix) > F {fj) . 

T— 5>oo T->cxD 
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By the definitions of in (|4.6p and At in (j4.14p 

q (x) Ef]T (dx) 
q{x) 



-E 



Vt — 1 



ry — 1 



1 



-E 



i=l 
Vt — 1 



r </ (^-i) , 



i=l 



i=l 
^ry — 1 



\ i=i 



XjUli 



+ E 



hr^-i (dx) At 



Recalling the definition of in ()4.17p . we have 



[ECT]^idx) = -YE[5x,_Ad'■ 
1=1 



X) rrii 



This implies the total variation bound 

\\q {x) EfjT {dx) - [E^t]^ (dx)llTv < fE 
Recalling the definition of rhrj, in ()4.15p we conclude that 

\\q{x)EfiT{dx)-[ECT]i{dx)\\r^y< ' 



T' 

By taking limits we arrive at (|4.2ip . ■ 

Lemma [4. 11 Lemma [4. 21 and Lemma 14.41 together imply for a sequence of controls 
{oii^ai} satisfying (|4.5p . along any subsequence of T such that rr/T — )■ we can 
extract a further subsequence along which 



liminf E \F {f]T) + RE^ + RE^] > F (r?) + AR (/x 

T— >oo 



Sxl 



a ) + / (^i) C ('^2;, du) 

(4.22) 

where ry, /i and ^ satisfy the constraints M.llh . (I4.2ip . and (I4.10p if ^ > 0. 

Recall that our goal is to prove (j4.8p . Hence we need to establish the relationship 
between the right hand side of (|4.22p and the rate function / defined in Section 13.11 



4.2.5 Properties of the rate function / 

We prove the following lemma, for which we adopt the convention • cx) = 0. This is 
in fact the key link, showing that the rate function that is naturally obtained by the 
weak convergence analysis used to prove the upper bound in fact equals I for suitable 
measures, and also indicating how to construct controls to prove the lower bound for 
this same collection of measures. Note that the constraints appearing in the lemma 
hold for the subsequence appearing in (I4.22p due to Lemmas 14. 1^ 14.21 and 14.41 
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Lemma 4.5 Let I (77) be defined by h3.2\) . Suppose that 77 <C tt, that ^ and ^ satisfy 
the constraints 



q (x) 1] (dx) = (dx) and / {dx, du) = A[fj]-^ {dx) , (4.23) 

Jr+ 

and that when ^4 > the constraint [fi]i = [fi]2 is also true. Then 

I{T])<AR{fi\\[fi]^^a)+ [ l{u)i{dx,du). (4.24) 



Moreover, 

I {rj) = inf 



AR{fj,\\[fj]-^(^a) + l{u)i{dx,du) 
JsxR+ 



where the infimum is over all possible choices of A > 0, /j, and ^ satisfying these 
constraints. 

The proof of this lemma is detailed. The reason we present it here instead of in 
an appendix is the previously mentioned fact that the construction of A, // and 
that minimize the right hand side of (j4.24p indicates how to hit target measures rj 
that are absolutely continuous with respect to the invariant measure in the proof of 
the Laplace lower bound. 

Proof. We first prove the inequality (j4.24p . If the right hand side of (j4.24p is 00, 
there is nothing to prove. Hence we assume it is finite. First assume A > 0, in which 

case R{|I\\[fJ,]-^^ iSi a) < 00. Define 



s 



^{dx) = q{x)7r{dx) /Q. (4.26) 

Since tt is invariant under a, by [H Lemma 8.6.2] [fj]-^ <^ vr. By (j2.ip q is bounded from 
below, and hence [fj]^ <C tt. Recall that the definition of / in (|3.2p uses 6 = dij/diT. 
Define Q = {x £ S : 9 {x) = 0}. By (I03]l 

L ^'^211 {du\x) 



Q = I q{x)TT{dx) , (4.25) 

SO that by ([22]) 



lR+<2\l{du\x) 

J q {x) rj {dx) 

Ir+ <2|i {du\x) 



A 



{x) e {x) TT {dx) (4.27) 



where for a measure ly on S x i/2|i denotes the regular conditional distribution 
on the second argument given the first. Thus [fi]-^ (0) = 0. Now suppose that 

9^/^ (x) 0^/2 {y) q {x) a {x, dy) tt {dx) = 0. 

SxS 
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Then for vr-a.e. x € 5'\6, a(x,9) = 1, and hence (/ii q)[(S'\6) x 6] = 1. On 
the other hand, fi{{S\Q) x 0) = due to [fj]-^ = [/ujg. This violates the fact that 
^ if^ II [/u];^ (g) a ) < oo. We conclude that 

9^/^ (x) 9^/^ {y) q (x) a {x, dy) vr {dx) > 0. 

SxS 

Lemma 13.21 implies that 

9^/^ (x)0^/2 (y)a{x,dy)n{dx) 

s 

= - log / gi[loge(a;)+log9(y)]^ ~ 

<R{fM\\n<g)a)-l [ [log e (x) + log (y)] [dx, dy) . (4.28) 



Strictly speaking, the inequality above does not fall into the framework of Lemma l3.2l 
because log0 is not bounded. However, if one goes through the proof of this lemma 
[U Proposition 1.4.2], then the above inequality is true as long as the right hand side 
is not of the form oo — oo. Towards this end, it suffices to prove 

I [ [\og9{x) + log9iy)]fiidx,dy)= [ log (x) (dx) < oo. (4.29) 
^ JSxS Js 

In the appendix we will prove [this being the only place where Condition 12.51 is used] 
that 

ii(Mip)<oo. (4.30) 

For now, we assume this is true. Using (|4.25|) . (|4.26|) . and (|4.27|) to evaluate the 
relative entropy, 

oo > R{[fi]^ = ^log (^j^ <2|i {du\x)^ [fi]^ (dx) + j^log 9 (x) [fi]^ (dx) +log^. 

(4.31) 

We know from ()2.ip that Q > Ki. Also, by (I4.23P and the nonnegativity of I 



log {^j^ ui2\i {du\x)^ [^]i (dx) 

<2|i {du\x)^ log <2|i (d-ulx)^ {dx) 



1 



<2|i {du\x)\ [e]i (dx) + i / <(dx,dn) - i / [i], (dx) 
<2|i {du\x)\ [e], (dx) + / [/x], (dx) - 1 /" g (x) (dx) (4.32) 



> 1 - ii^.. 
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the second constraint in (j4.23p is used for the first equahty; the definition of i gives 
the second equahty; both parts of (j4.23p assure the third equahty; finaUy the non- 
negativity of i is used. Thus rearranging (j4.3ip gives (j4.29p . 
The chain rule of relative entropy gives 

R{fi\\n(^a) - I I [logOix) + log 9 {y)]fi{dx,dy) 
^ JSxS 

= -R(MiP)+ / ^(^*2|i l|a) Ml (fix) - / \oge{x)[i^i]^{dx) 
Js Js 



= R{[fi]^\\n) + Rif,\M,^a)- / log 9 (x) [fi]^ (dx) . (4.33) 

Js 

By (|T3T]) and ([02|) and the convexity of i 

Ri[fi],\\n)- [ log0(x)Mi(dx) 
Js 

= log {^j^ <2|i {du\x)^ Ml {dx) + log ^ 

= j[^([ <2|i {du\x)] [eli (dx) + / Ml (dx) -^fq{x)r^ {dx) + log | 

< \ I Hu)C (dx, du) + l-\ f q{x)7] {dx) +log%. (4.34) 
^ J5xR+ ^ Js ^ 

In summary (g^H]), (|i33]) and imply 

-log / 0^/2 (x) 0^/2 
JsxS 

= - log /" 0^/2 Ql/2 ^ ^y-j ~ ^^^^ _ Q 

JsxS 

<R{fi\M-^0a) + ]- I £{u)^{dx,du) + l- ]- I q{x)r]{dx)+ log]-. 

^ J5xM+ ^ Js ^ 

Thus 

(2;) ^1/2 ^ (3,) ^ ^ 

5x5 

< - exp I - ( (^ II Ml Q ) + ^ / £ (u) ^ (dx, du)^\-- I q{x)r] {dx) + log 

(j4.24p then follows from the fact that — e"'' < ar + a log a — a for any r € M and 
a € M+ by taking a = A and 

r = i?(/i||Mi®a) + ^ / £{u)^{dx,du) + l- ]- / g(x)r/((ix)+log^. 

^ J5xR+ ^ J5 ^ 
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For the case when A = 0, (j4.23p imphes that {dx, du) = 0, which means that 

/m ^^211 {du\x) = [^]-^-a.e. Hence by the convexity of £ and q (x) -q (dx) = [^]i (dx) 



I 

JSxl 



I iu) i {dx, du)> I llj <2|i {du\x)\ [i]^ {dx) 
[i]i{dx) 
q {x) rj {dx) 



> / q{x)ri{dx)- / 9^/'^{x)9^/'^{y)q {x) a {x, dy) vr {dx) 
Js JsxS 

= 1(7?). 

Thus (|4.24p also holds in this case, and completes the proof of the first part of Lemma 



We now tmm to the second part of Lemma [4.51 The definitions and constructions 
used here will also be used to construct what are essentially optimal controls to 
prove the reverse inequality in the next section, and indeed the particular forms of 
the definitions are suggested by that use. In particular, Ak{x) will correspond to a 
dilation of the mean for the exponential random variables. In light of the second part 
of Lemma 13.21 we define fi by 

{x, y) = 9'/^ {x) 01/2 (y) / / 01/2 (^) oi/2 (~ ^ ^) (^^^ _ (4 35) 
d{TT(g)a) I J SxS 



Note that by the Cauchy-Schwartz inequality, the detailed balance condition ([27 
and the relation between vr and vf (see (j2.2p ) imply 

/ 9^/^{x)9^/^{y){n^a){dx,dy) < [ 9 {x) a{x,dy)n {dx) < 
JsxS JsxS Q 

Hence ji is well defined and [fj]^ = [/^Jg- Then Lemma [372] implies that 

-log / 9^/^{x)9^/^{y)a{x,dy)^{dx) = R{fi\\n0a)- [ log 9 {x) [i^i]^{dx) . 
JsxS Js 

(4.36) 

If i? (/X IItt (g) a) = oo or — log 9 {x) [jj]^ {dx) = oo, the last display implies 
0i/2 (x) 0i/2 {y) q (x) Q (x, dy) vr {dx) = 0. 

5x5 

By letting ^ = and ^ (dx, du) = q (x) r/ {dx) 6o {du), then ^ and fi satisfy ()4.23p and 

AR{iJ.\\{^i\^(S)a) + / l{u)i{dx,du) = / q {x) r] {dx) = I {r]) . 
JsxK+ Js 
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Next assume i? ||7r (8> a) < oo and — log 6 (x) [fj]-^ {dx) < oo. Define A by 



A = exp 
Define the measure 

and 



R{fi\\TT(g)a)- / logO {x)[fj]^{dx) 
s 



p (dx) = q{x) 6 (x) IT (dx) , 



logQ 



K = d[jj]^ /dp. 
Then for any x € S\Q (recah = {x € 5 : 6{x) = 0}) 

1 d[p]. 



Qe{x) d^ 



(x) 



In addition 



K (x) log K (x) p (dx) = / log K, (x) [p]^ (dx) 



Define 



and 



b{x) 



for X E 
Ak. (x) for X ^ 



(4.37) 

(4.38) 
(4.39) 



R{[p]^\\n)- [ loge{x)[p]^{dx)-logQ. (4.40) 
Js 



(4.41) 



^ (fix, du) = q{x)r] (dx) 5^(3,) (du) . (4.42) 

Then ^ satisfies the first part of (j4.23p . To see that the second part of ()4.23p is 
satisfied, note that 

[pl^ (0) = = / uC{dx,du) 

and 



{dx, du) = b{x) q (x) 7] (dx) 

= Ak (x) q (x) 6 (x) vr (dx) 
= A[p]^{dx). 
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By using the definitions we arrive at the following, each line of which is explained 
below: 

AR{n\\[n]^(g) a) + / l{u)i{dx,du) 
JsxR+ 



AR{fj,\\[fi]-^^a) + / i{h{x))q{x)r]{dx) 



= AR{^l\\[^l\^(^a) + / q{x)'n{dx)+ / l{b{x))p{dx) 
Je Js\e 

= AR{ij.\\[n]-^^iE) a) + / q{x)r]{dx) + AlogA- A + A / k (x) log k (x) p (dx) 
Js Js 

= J q{x)r]{dx) + AlogA - A + A^R{fi\\Tr (g)a) - J log9{x) [/i]^ (dx) - log Q 

= q (x) 7] (dx) — A. 
Js 

The first equality uses (|4.42p and the second uses (|4.4ip . The third uses (|4.4ip again, 
expands i, and uses k = d[fi]-^ /dp and r/(6) = p{@) = 0. Equality four then uses 
([OO]) and the fifth follows from ([QT]) . Note that (f06|) and ([22]) imply 



A= e^/^ {x)e^l^ {y)q{x)a{x,dy)-K{dx) . (4.43) 
JsxS 



Hence we obtain 



AR{p\\[p]^®a) + / e{u)^{dx,du) = I {r]) . 
Js 
m 

The representation formula ()4.4p . the lower bound ()4.22p and Lemma [4.5l together 
give 

lim inf -^logE [exp { -TF(7?t) - T • cx) • (1{.^/t}^ (^r/T)) }] (4.44) 
> inf [F(r?) + /(r?)]. 

4.3 Combining the cases 

In the last section, we showed that (j4.44p is valid for any sequence {rx} such that 
rx/T A G [0, C]. An argument by contradiction shows that the bound is uniform 
in A. Thus 

lim inf -- log <^ Yl ^ [«^P {-THvt) - T ■ oo ■ (1{.^/t}= (Rt/T)) }] 

1 f ^^''J 1 

>lkninf--log<^TC- \/ [exp {-TF(7?t) - T • oo • (l^.^/^jc (i^^/T)) }] ^ 

[ rT=l J 

> inf [F(r?) + /(r?)]. 
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We now partition E [exp {—TF{ri'X')}] according to the two cases to obtain the overah 
lower bound 

lim inf -^logE [exp {-TF{i]t)}] 

>mm\ inf [F [r]) + I {r])] , [C + C log C - K2 - C log K2]\ 
Letting C ^ 00 we have the desired Laplace upper bound 

liminf-;^log£;[exp{-rF(?7r)}] > inf [F (r/) + / (r?)l . (4.45) 

5 Proof of Laplace lower bound 

We turn to the proof of the reverse inequality 

limsup-;^logF[exp{-rF(7?T)}] < inf [F (r/) + / (r/)] . (5.1) 

Let F be a nonnegative bounded and continuous function. Fix an arbitrary e > 
and choose rj such that 

F{r]) + I{7])< inf [F{i^) + I{i^)] + e. (5.2) 

uePiS) 

As pointed out in Remark 12.81 H defined in ()3.ip is dense in V{S). Since / was 
extended from H to V (S) via lower semicontinuous regularization, we can assume 
without loss of generality that r] <^ tt. Define 9 = drj/dTi. We now argue we can 
further assume there exists (5 > such that 

5<e{x)<\ (5.3) 


for all X G 5*. li r]^ = {1 — 5) r] + 5t: then drf jdr: > 6, and the continuity of F and 
the convexity of / imply that the difference between F + / (ry'') and F (7]) + 1 (rj) 
can be made arbitrarily small. 

Thus we can assume 9 is uniformly bounded from below away from zero. Let 
n € N, and define 

Tj (\x ' (x) ^ ^1) 

rf (dx) = 6 (x) l{e(..)<n}vr (dx) + ^ (|^ . g ^ l{eW>n}^ (^^) ■ 

Then d-q^/dir < [rj {{x : 6 (x) > n}) /vr ({x : 9 {x) > n})] V n, and since rj <^ n implies 
TT {{x : 9 (x) > n}) 0, 77" converges weakly to r]. It then follows from the continuity 
of F and lower semicontinuity of / that we can choose r] satisfying (j5.2p with 2e 
replacing e and also ()5.3p . Hence we assume rj satisfies (15. 2p and ()5.3p . Furthermore 
by Lusin's Theorem [71 Theorem 7.10], we can also assume that 9 is continuous. 

The proof of the lower bound will use the following representation. The infimum 
in the representation is taken over all control measures {ai,ai}, and the properties 
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of such measures and how fjT and Rj- are constructed from them were discussed 
immediately above the similar representation ()4.4p . The proof of the lemma is given 
in the appendix. 



Lemma 5.1 Let F :T{S) 
1 



be bounded and continuous. Then 



: log£; [exp {-TF (?7r)}] = mi E 



^ Rt 

F (f?T) + ^ E II") + ^ Ik)) 



1=1 



where the infimum is taken over all control measures {ai,ai}. 

Suppose that given any measure r] £ V (5) satisfying (j5.2p and (j5.3p . one can con- 
struct Oi and cTj such that given any subsequence of T, there is a further subsequence 
T„ such that 



lim E 



1 

F (f?Tj + ;^ {R ||a) + /2 (a^ ||a)) 



i=l 



F{v)+I{v) 



Then Lemma [5.11 implies the Laplace lower bound ()5.ip . The construction of suitable 
(Xi and fjj turns on many of the same constructions as those used in the proof of the 
second part of Lemma [^31 We first define fi £ V {S x S) as in (|4.35p . Then auto- 
matically [/x]^ = [fj]2, and hence if we define p as the regular conditional probability 
such that /X = [/i]^ 'S) p, then [fi]^ is invariant under p [H Lemma 8.5.1 (a)]. Define 
Oi = p for each i, and let {^i} be the corresponding Markov chain. Next define 
p (dx) = q (x) rj (dx) and 



d\ 



dp 



1 d[p]i 
9{x) dn 



(x). 



(5.4) 



By (j5.3p . there is < oo such that 1/M < k < Af, and due to the continuity of 9, 
K is also continuous. Notice that 



T] (dx) = {q (x) k{x)) ^ [p]-^ (dx) 
Assumption (|5.3p guarantees that 



(5.5) 



log / 6*^/^ (x) 6*^/^ (y) (vr a) {dx, dy) < oo and - / log 6* (x) [^J^ (dx) < oo, 
'5x5 is 



and (|4.36p then implies that R{p\\7r a) < oo. Define ^ as in ()4.43p . Let ai be the 
exponential distribution with mean ^Ak (Xj_i)] ^ for each i. Thus we can construct 
a Markov jump process X (t) using Oi and CTj instead of a and o", and the infinitesimal 
C generator will be bounded and continuous and takes the form: 



Cf (x) = Ak (x) q (x) / [/ (y) - f (x)] p (x, dy) . 
Js 
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(|5.5p and the fact that [^i]^ is invariant under p imply [Cf (x)) rj (dx) = 0, and r] 
is an invariant distribution of the continuous time process X. We claim that rj is the 
unique invariant distribution of X. Indeed, by [H Proposition 4.9.2] any invariant 
distribution z/ for X satisfies [Cf (x)) u (dx) = 0. If we define 



u (dx) 



Ak (x) q (x) z/ (dx) 
fg Ak (x) q (x) u (dx) 



then z? is invariant under p. However, by Condition 12.31 and [H Lemma 8.6.3(c)] the 
invariant measure under p is unique, and hence the invariant measure of X is also 
unique. By the definition of f/T in ()4.3p . 



1 

f ^x{t) (•) dt 



1 
T 



Rt-I 
i=l 



Ti 



X. 



Rt~1 

E 



(5.6) 



Since S is compact we can extract a subsequence of T such that 772^ converges weakly, 
and by [BJ Theorem 4.9.3] this weak limit is rj. We claim the following along the same 
subsequence. 

Lemma 5.2 E [Rt/T] A, E \Y,fli R {^i h) /t] Jg^ (An (x)) q (x) r/ (dx) and 



E 



H^iH^^Adx) /T ^A[^AAdx 



Proof. As in the proof of the upper bound, a minor nuisance is dealing with the 
residual time T — ^^^ifi- However, this is more easily controlled here since it is 
bounded by an exponential with known mean. Since fjT ^ V weakly, we have for any 
bounded and continuous function / on the space of subprobability measures on S, 
limj'-^.oo E [f (t/t)] = f (il)- To prove the first part of the lemma, define / by 



f {^) = / ^ (^) 9 {x) ^ {dx) . 
Js 

Since both k and q are bounded and continuous, / is also bounded and continuous. 
Using ([53]) 

f{r])= / K{x)q{x)r]{dx) = / [/x]^ (dx) = 1. (5.7) 
Js Js 
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Thus lirriT^oo E [f {rfx)] = 1- Now by (j5.6p and the definition of Rt 



E 



Rt 



f (vt) - f \ T^^Sj^^_^ (dx) 



i=l 



T 



- rp 



q {X^-l) 

Rt 



\ 1 (^-i) 



[Xrt^i) 1 {Xr,^,) 



< 



> 



as T — 7> oo. Hence 



hm E 

T-foo 



Rt 



n 



i=l 



q {Xi-i) 



1. 
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Recall that Ti is the a-algebra generated by { (Xq, . . . , Xi) , (ti, . . . , Tj)}. Then 



E 



T 
1 



i=l 



1=1 



-E 



i-l 



i=l 



i=l 



i=l 



i-l 



< T 



1 IE 



1 



1 

1 



oo / i — 1 



El E 



i=l 

Rt 



< T 



This completes the proof of the first statement in the lemma. 
The proof of the second statement is similar. Define / by 



Js 



Then as before, 



/(r/) = lim E[f{r]T)]= lim E 



■i=i 
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Using g (x) = x£ (l/x) and Lemma H?3t we have 



E 



1 



Rt 



E 



i=l 



1=1 



i=l 



i-1 



Rt 



i=l 



1=1 



and the second part of the lemma fohows. 

The proof of the third part fohows very similar lines as the first two, and is 
omitted. ■ 

Now the Laplace lower bound is straightforward. The definition of // in (14. 35^ . 
the continuity of 6, and the bound (j5.3p imply x — )■ R{p{x,-) \\a{x,-)) is bounded 
and continuous. By Lemma 15.21 and the chain rule for relative entropy, 



hm E 

T-s-oo 



F {f)T) + ^ E ("^-1 \\a)+R \\a)) 

i=l 

lim E[F{fiT)]+ lim / R{p{x,-)\\a{x,-)) E 

T— >-oo T— >oo J a 



+ lim E 

T^oo 



^ Rt 



4 = 1 



^E^(^.lk) 



T 



i=l 



= F [r]) + AR{^\\[^\®a) + I i {An (x)) q (x) r] (dx) . 

Js 

Returning to the proof of the second part of Lemma 14.51 we find that with this 
choice of A, fi and k, the rate function I {ij) coincides with ^i? || [^]-^ a) + 
Jg i {Ak{x)) q (x) T] (dx) (note that this rj corresponds to a special of Lemma US 
where @ = {x £ S : 9 (x) = 0} is empty). This completes the proof of the Laplace 
lower bound. 
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6 On the boundedness of rate function 



As pointed out in the Introduction, continuous time jump Markov processes differ 
from the type of processes considered by Donsker and Varadhan in [21 [3], in that the 
dynamics do not have a "diffusive" component, and hence Condition 11.11 does not 
hold. For jump Markov models, the process only moves when a jump occurs, and 
there is no continuous change of position. For these processes the rate function is 
bounded, whereas for the processes of [51 [3] the rate function is infinity when the 
target measure is not absolutely continuous with respect to the reference measure. 
We now consider the source and implications of this distinction. 

Consider a process satisfying all the conditions in Section [2] that has vr as its 
invariant distribution. In order to hit a different probability measure rj € V {S), we 
need to perturb the original dynamics, which includes the distortion of the Markov 
chain transition probability a and the distortion of the exponential holding time 
a. Each of these distortions must pay a relative entropy cost, and the minimum of 
the (suitably normalized) sum of these costs asymptotically approximates the rate 
function / (r/). When rj is singular with respect to vr, the relative entropy cost from 
the distortion of a can be made arbitrarily small, and the rate function is almost 
entirely due to contributions coming from the distortion of a. We will illustrate this 
point via the following example. 

Recall the model mentioned in the Introduction, where the state space S is [0, 1] , 
the jump intensity is g = 1, and for each x G [0, 1], a (x, •) is the uniform distribution 
on [0, 1]. The invariant distribution vr is just the uniform distribution on [0, 1]. Now 
consider a Dirac measure 77 = (5i/2 as a target measure, t] is not absolutely continuous 
with respect to vr. However, we can approximate 77 weakly via a sequence of prob- 
ability measures that are absolutely continuous with respect to vr. For each n S N 
define a probability measure r/" by its Radon-Nikodym derivative 0" with respect to 
vr according to 



According to the definition of rate function in Section [3?T1 the rate function is bounded 
above by 1. However I {rp) — ?> 1 as n — )• 00, and one can check that this is true for any 
sequence of absolutely continuous measures converging weakly to 77. Thus / {rf') = 1. 

We now consider fixed n € N and examine the perturbed dynamics that can hit the 
measure rf' . This is most easily understood by examining the minimizer in the varia- 
tional formula for the rate function, whose form was suggested during the proof of the 
Laplace principle lower bound in Section^ Recall that cij (•) and Qj (•) are perturbed 
dynamics for the exponential holding time and the Markov chain, CTj (•) depends on 
{Xo,fi,Xi,f2, . . . ,Xi_i] and on (•) depends on {Xo,fi,Xi,f2 . . . ,Xj_i,fi}. fj and 




Using the formula (j3.2p for rate function, we have 




= 1 - 



4(n- 1) 
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Xi are chosen recursively according to stochastic kernels (•) and on (•). Specifically, 
Si is defined by (jl.lOp using and fj; Rt is defined by (j4.2p using s^; and f]T is 
defined by (j4.3|) using Xj, fj and fiy- Following the procedure in Section [5l we first 
define /i G "P (5 x 5) as in (j4.35p . Thus n is the product measure. As before, we 
use [n]-^ to denote the first marginal of n and p to denote the regular conditional 
probability such that /x = [/i] ^ <8) p. Since /z is a product measure defined by (|4.35p , 
and p are in fact the same measure and the density with respect to tt can be 
calculated as 

d-K \ 2{n-i) otherwise ' ^ ' > 

As in Section [5] let Ui = p for each i. A direct calculation of A using formula ()4.43p 
shows that A = A{n — 1) /ii? . Also, k defined in ()5.4p reduces to 

^ (•j;) = I 2(n"-l) for X G (I - 2^' ^ + 2^) 

\ ^ otherwise 

As in Section [51 a-i should be the exponential distribution with mean [Ak ^ . 

Hence if Xj__i falls into (1/2 — l/(2n),l/2 + l/(2n)), ai would be the exponential 
distribution with mean n/2, otherwise ai would be the exponential distribution with 
mean n/ [2 (n — 1)]. Now the perturbed Markov jump process, denoted by X (t), is 
constructed using oti and ai defined as above. As proved in Lemma [5?2| the expected 
value of the relative entropy cost 

^ Rt 

— ^ (i? (ai_i \\a) + R {ai \\a)) 



T 



convergences to 



I{if) = AR{n\\[^\(^a)+ [ £{AK{x))r)'^ (dx) 

Jo 

as T ^> oo. We have noted that p {x, dy) = [fi]-^ {dy) and a (x, dy) = vr {dy), and by 
using (j6.ip 

AR ®a) 
1 



A I R{p{x,-)\\a{x,-))[^]^{dx) 







4 (n - 1) / log (n 
' log n — log 2 



This converges to as n ^ oo. Hence the relative entropy cost that comes from the 
distortion of the Markov chain converges to 0. For the second term, we have 

l{AK{x))if {dx) = ^ , ^ (log(n-l) + 21og2-logn)- ^ , ^ +1 
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which converges to 1 as n ^ oo. Thus as r/" approaches the target distribution r], the 
relative entropy cost that comes from the distortion of Markov chain vanishes, and 
the rate function becomes solely determined by the relative entropy cost that comes 
from the distortion of exponential waiting times. 

One can generalize the argument to more general discrete target measures, where 
one utilizes the underlying dynamics to make sure neighborhoods of the various points 
are visited, and then uses the time dilation to control their relative weight. 



7 Appendix 

7.1 Proof of inequality ( ICTl 

Proof. Recall that R{n ||[/u]^ ® a) < oo, where [fj]^ = [/xjg and vr is invariant under 
a. Additionally, we also have Condition 12. 5^ i.e., there exists an integer N and a real 
number c G (0, oo) such that 



a 



(N) 



(x, •) < CTT (•) 



(7.1) 



for all X £ S. Now let p be the regular conditional probability such that /x = [/u]^ 
Then 

-^(/^ IIMi "X" a) = -R(Mi <X)p IIMi <Xi a) < oo. 
The chain rule of relative entropy implies that 



R 'S>p(E) ■ ■ ■ (S)p 

N 



[/ill ®a(g)---(g)a =N -Rilfj]^ ®p\\[fj]i (8)a) < oo. (7.2) 
N ' 



Indeed, since [fi]-^ is invariant under for any integer n the n-th marginal of 
is 

Ml • 



[/U] 1 (g) p (g) • • • (g) p 

n— 1 



[fAi fSip® ■ ■ ■ (Sip 

n-1 



Hence (j7.2p follows by induction: 



R[ Ml Op. 



n-1 



[^]]^ (g) Q (g) • • • (g) a 

n 

p Ml (g a (g • • • (g) a ) + / R{p\\a)d 
/ Js 



Ml '^p<^ ■ ■ ■ 'S)p 

n-1 



= {n- 1) ■ R{[fi]^(g)p\\[fi]^(g)a) + / R{p\\a)d[fi]-^ 

Js 

= n ■ ii (Ml '^P IIMi ® a) • 

Let denote the conditional probability of the /c-th argument of v given the j-th 
argument of v. Note that one can define a mapping from V (S"^"*"^) to V (S"^) such 
that each u £ V [S^^^) is mapped to [z^J^ (g) [i^]jv+i[i- Since the relative entropy for 
induced measures is always smaller, ()7.2p implies 



R 



(Ml 



(g)p 



(TV) 
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Now since [fj]^ is invariant under p, it is also invariant under p^^\ and therefore 



[Ml <^p 



[/u]^. Using the chain rule of relative entropy again gives 



This implies (j4.30p . since 

oo> R([fi]^ [[Hi«)a^^^ 



< oo. 



where c is from (17.1 



f d(\ii]-. ®q(^))^ 

= i? (Ml 11^) -log J^^^^ — ^Mi 
> -R(Mi -logc, 



7.2 Proof of Lemma 15.11 

The proof of the representation is standard, save for the fact that Rt is random. We 
include a proof here for completeness. 

Proof. Define for each k G Nj_ 



v't (•) = 



1 

T 



i=l 
,k 



+ 5x,^^k-i{-)\T- 



For any measure G P ( (5 x M+) I , we can decompose v as 



v = ao (8) (71 (g) ai (8) (72 



ak-i ® Ok- 



(7.3) 



Choose the barred random variables Xi and fi according to oti and ai as before and 
define the corresponding Rt A k the following way: if X](Li ^i/q (Xj_i) > T, then 
-Rt f\k = Rt where Rt is the integer that satisfies 



Rt-i 

E 



< 



- c? (^,_i) - ^ g (X,_i) ' 
otherwise define Rt Ak = k. We also define 

-RrAfc-l _ / RT/\k-l 



4 (•) = ^ 



g(X,_l) ' "^«T 



1=1 



J-) ^- E 



r <?(x._i) 



(7.4) 

If we denote the multi-dimensional probability measure corresponding to the original 
dynamics by G ({S x M+)^'), i.e., 



/ = a^'^) X n 
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then applying Lemma 13.21 gives 
-ilog^[exp{-rF (4 



)} 



inf 

v'=GP((SxR+)'=) 



(5xR+)'= 



By applying Theorem 13.31 repeatedly to R [y^ we obtain 



(7.5) 



^(i?(ai_i + \\a)) 



i=l 



We can thus rewrite (j7.5p as 



--logi? 



exp{-rF (77^)} 



inf £; 

vfeeP((SxR+)'=) 



1=1 



(7.6) 

Now for each (^S x ]R+)'^^ , we construct another measure (^{S x M+)'^^ 

recursively as follows: define q?o = and ai = ai. For all 2 < i < k, define Oj-i 
and di by 



{ai-i,ai 



(a,o-) 



-i=l 9(X,_i) 

otherwise 



Thus we return to the original dynamics with zero relative entropy cost after Rt- 
Define u'' using a, and di by ([73]). Prom the definition ([731) we have E [F (77^,)] = 
E [F {4)], and 



E 



'^{R{ai-i \\a)+R{di \\a)) 



.i=l 



E 



E 



< E 



RrAk 

{R{ai-i \\a) + R{ai \\a)) 

1=1 

{R{ai-i \\a) + R{ai \\a)) 

i=l 
■ k 

^(i?(ai_i \\a)+R{ai \\a)) 
.1=1 



Hence we can rewrite (|7.6p as 



--logi^ 



exp{-TF (4)} 



inf E 



RrAk 



^m+7f^ E {R{c^^-l\\0^) + R{^^\W)) 



i=l 



(7.7) 

Using the pointwise convergence of both Rt A A; — )■ Rt and Rt /\k Rt as k 00, 
by the dominated convergence theorem 



lim — — log E 



exp {-TF (t?^) }] = -1 logii; [exp {-TF {tit)}] , 
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lim E 



F(f]!^J\ =E[FifjT)] 
Also, by the monotone convergence theorem 



hm E 

fc— >oo 



RrAk 

{{R{ai.i\\a)+R{ai\\a))) 

i=l 



E 



i=l 



Hence by taking hmits on both sides of ()7.7p . we arrive at 



: log E [exp {-TF (vt)}] = inf E 



F {f)T) + yY.^{R {o^i-i \\a) + R {a, \\a))) 



where the infimum is taken over all controlled measures {aj,crj}. This proves the 
lemma. ■ 
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